-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[flang-rt] Add Assign_omp RT call. #145465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
allocatable.cpp | ||
array-constructor.cpp | ||
assign.cpp | ||
assign_omp.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's a good idea to make the Fortran runtime depend on the OpenMP runtime library. I think it makes more sense to have this routine live in the OpenMP offload runtime as a potentially generic API entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortran runtime function implementations exist in the OpenMP runtime library like this: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_ftn_entry.h . If it is offload-only, it may also live in libomptarget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context:
Assign_omp just does omp_target_memcpy between two device pointers.
void RTDEF(Assign_omp)(Descriptor &to, const Descriptor &from, const char *sourceFile, int sourceLine, omp::OMPDeviceTy omp_device)
This api is required, when hoisting "fir.call @_FortranAAssign(...)" from omp.target to the host in "lower-workdistribute" pass #140523
Descriptor struct is defined in flang-rt/runtime/descriptor.h.
Issue:
Now if I need to move this implementation to openmp runtime or libomptarget, I wouldn't have access to fortran-rt Descriptor structure there. Is there any solution to deal with such issue?
Probable solution:
Instead of creating new runtime api, may be, make call to "omp_target_memcpy" directly by extracting the ptrs from Descriptor structure at MLIR stage? Need to check if this is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two options:
- Use the CFI of Fortran to access the descriptor. It will sort of tie the OpenMP runtime to the Fortran runtime, but it will do so in an ISO-conforming way.
- Unpack the descriptor in the code-gen for the assign operation. It might be prudent to provide an accessor function for getting the pointers and size information from the descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding tests as well
allocatable.cpp | ||
array-constructor.cpp | ||
assign.cpp | ||
assign_omp.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortran runtime function implementations exist in the OpenMP runtime library like this: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_ftn_entry.h . If it is offload-only, it may also live in libomptarget.
// If not present on the device it should already be a device ptr | ||
if (!omp_target_is_present(voidAnyPtr, ompDevice)) | ||
return anyPtr; | ||
T *device_ptr = omp_get_mapped_ptr(anyPtr, ompDevice); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use the same style for variables names: devicePtr
.
|
||
typedef int32_t OMPDeviceTy; | ||
|
||
template <typename T> static T *getDevicePtr(T *anyPtr, OMPDeviceTy ompDevice) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for easier reasability:
template <typename T> static T *getDevicePtr(T *anyPtr, OMPDeviceTy ompDevice) { | |
template <typename T> static T *getDevicePtr(T *hostPtr, OMPDeviceTy ompDevice) { |
std::size_t fromElements{from.Elements()}; | ||
|
||
if (toElementBytes != fromElementBytes) | ||
terminator.Crash("Assign: toElementBytes != fromElementBytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging the number of element bytes (and elements below) in the crash might be helpful when debugging.
terminator.Crash("Assign: toElements != fromElements"); | ||
|
||
// Get base addresses and calculate length | ||
void *to_base = to.raw().base_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Naming style is different in this function as well.
} | ||
|
||
RT_API_ATTRS static void Assign(Descriptor &to, const Descriptor &from, | ||
Terminator &terminator, int flags, OMPDeviceTy omp_device) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flags
is not used. Do we need it here?
std::size_t toElementBytes{to.ElementBytes()}; | ||
std::size_t fromElementBytes{from.ElementBytes()}; | ||
std::size_t toElements{to.Elements()}; | ||
std::size_t fromElements{from.Elements()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to check also that descriptors are contiguous. You can have the same number of elements but the stride might be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want it to work also on non contiguous descriptors, the Assign function as a mechanism to pass memmove function to use.
//===-- lib/runtime/assign_omp.cpp ----------------------------------*- C++ | ||
//-*-===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format
Will this add a hard OpenMP dependency for anyone building the flang runtime? This should probably be placed in a separate library just for OpenMP like it is done for CUDA Fortran. |
@clementval As discussed in #152756 and based on your feedback its better to not have openmp rt dependency on flang-rt. Closing this PR. |
Add runtime Assign_omp call which does memcpy of ptrs on device using "omp_target_memcpy".
void RTDEF(Assign_omp)(Descriptor &to, const Descriptor &from, const char *sourceFile, int sourceLine, omp::OMPDeviceTy omp_device)
This PR is pre-requisite for workdistribute lowering PR #140523